Sequential ranking under random semi-bandit feedback
نویسندگان
چکیده
In many web applications, a recommendation is not a single item suggested to a user but a list of possibly interesting contents that may be ranked in some contexts. The combinatorial bandit problem has been studied quite extensively these last two years and many theoretical results now exist : lower bounds on the regret or asymptotically optimal algorithms. However, because of the variety of situations that can be considered, results are designed to solve the problem for a specific reward structure such as the Cascade Model. The present work focuses on the problem of ranking items when the user is allowed to click on several items while scanning the list from top to bottom.
منابع مشابه
Lecture 9 : ( Semi - ) bandits and experts with linear costs ( part I )
In this lecture, we will study bandit problems with linear costs. In this setting, actions are represented by vectors in a low-dimensional real space. For simplicity, we will assume that all actions lie within a unit hypercube: a ∈ [0, 1]d. The action costs ct(a) are linear in the vector a, namely: ct(a) = a · vt for some weight vector vt ∈ Rd which is the same for all actions, but depends on t...
متن کاملCombinatorial Multi-Armed Bandits with Filtered Feedback
Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set {1, ..., k} in each round, generating random outcomes from probability distributions associated with these ...
متن کاملCombinatorial Bandits Revisited
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret....
متن کاملAn Optimal Algorithm for Stochastic Matroid Bandit Optimization
The selection of leaders in leader-follower multi-agent systems can be naturally formulated as a matroid optimization problem. In this paper, we investigate the online and stochastic version of such a problem, where in each iteration or round, we select a set of leaders and then observe a random realization of the corresponding reward, i.e., of the system performance. This problem is referred t...
متن کاملSemi - Bandit feedback : A survey of results
In this report, we survey the recent results for the Semi-Bandit problems in the stochastic setting. For a motivation of the semi-bandit setting, we describe the first paper which looks at it in the adversarial setting. Other than this, all other results described in this survey look at stochastic setting. A common theme of most algorithms is that almost all of them are based on a UCB (Upper Co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1603.01450 شماره
صفحات -
تاریخ انتشار 2016